55 research outputs found

    Cache Coherence Protocols for Many-Core CMPs

    Get PDF

    El proyecto de investigación: Un complemento eficaz en la docencia de Arquitectura de Computadores

    Get PDF
    Los estudios de Informática son aun relativamente jóvenes y es necesario, por tanto, continuar mejorando la metodología docente que se aplica para facilitar la adquisición de los conocimientos asociados con una asignatura por parte de los estudiantes y, como consecuencia, aumentar su rendimiento académico. En este artículo presentamos la incorporación del proyecto de investigación a la metodología docente de la asignatura de Arquitectura de Computadores. Presentamos los diversos detalles prácticos que hemos aplicado para poder incluir este trabajo en la asignatura. Además, mostramos las opiniones que hemos recogido de los alumnos durante este tiempo. En general, calificaríamos nuestra experiencia como muy satisfactoria, dado que hemos podido constatar como los alumnos mejoran sus conocimientos sobre la materia, aumentando también su rendimiento académico e interés por los conceptos analizados. La opinión de los alumnos es en este mismo sentido, obteniendo una valoración positiva por el 84% de los alumnos

    CELLO: Compiler-Assisted Efficient Load-Load Ordering in Data-Race-Free Regions

    Get PDF
    Efficient Total Store Order (TSO) implementations allow loads to execute speculatively out-of-order. To detect order violations, the load queue (LQ) holds all the in-flight loads and is searched on every invalidation and cache eviction. Moreover, in a simultaneous multithreading processor (SMT), stores also search the LQ when writing to cache. LQ searches entail considerable energy consumption. Furthermore, the processor stalls upon encountering the LQ full or when its ports are busy. Hence, the LQ is a critical structure in terms of both energy and performance. In this work, we observe that the use of the LQ could be dramatically optimized under the guarantees of the datarace-free (DRF) property imposed by modern programming languages. To leverage this observation, we propose CELLO, a software-hardware co-design in which the compiler detects memory operations in DRF regions and the hardware optimizes their execution by safely skipping LQ searches without violating the TSO consistency model. Furthermore, CELLO allows removing DRF loads from the LQ earlier, as they do not need to be searched to detect consistency violations. With minimal hardware overhead, we show that an 8-core 2- way SMT processor with CELLO avoids almost all conservative searches to the LQ and significantly reduces its occupancy. CELLO allows i) to reduce the LQ energy expenditure by 33% on average (up to 53%) while performing 2.8% better on average (up to 18.6%) than the baseline system, and ii) to shrink the LQ size from 192 to only 80 entries, reducing the LQ energy expenditure as much as 69% while performing on par with a mainstream LQ implementation

    On the Evaluation of x86 Web Servers Using Simics: Limitations and Trade-Offs

    Get PDF
    Abstract. In this paper, we present our first experiences using Simics, a simulator which allows full-system simulation of multiprocessor architectures. We carry out a detailed performance study of a static web content server, showing how changes in some architectural parameters affect final performance. The results we have obtained corroborate the intuition of increasing performance of a dual-processor web server opposite to a single-processor one, and at the same time, allow us to check out Simics limitations. Finally, we compare these results with those that are obtained on real machines

    Una Experiencia de iniciación al paralelismo en segundo curso del Grado de Ingeniería Informática

    Get PDF
    En este artículo se analiza una experiencia de introducción del paralelismo de forma temprana en el Grado de Ingeniería Informática. En la experiencia participan cuatro asignaturas de segundo curso, impartidas por tres departamentos distintos y con la colaboración de un centro de computación. En este curso se realiza la primera aproximación de los alumnos al paralelismo, y se pretende realizar un acercamiento coordinado y práctico a diferentes materias.SUMMARY -- This work presents an experience of early introduction to parallelism in the Degree on Computing Engineering. Four courses of the second year participate in the experience and also a computing centre. The courses are tought by three departments. In the second year the students are introduced to parallelism for the first time, and with our experience we intend to approach different topics of parallelism in a coordinated and practical way.Peer Reviewe

    Way Combination for an Adaptive and Scalable Coherence Directory

    Full text link
    © 2019 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] This manuscript opens the way to a new class of coherence directory structures that are based on the brand-new concept of way combining. A Way-Combining Directory (WC-dir) builds on a typical sparse directory but allows to take advantage of several ways in the same set to codify the sharing information of each memory block. The result is a sparse directory with variable effective associativity per set and variable length entries, thus being able to dynamically adapt the directory structure to the particular requirements of each application. In particular, our proposal uses just enough bits per entry to store a single pointer, which is optimal for the common case of having just one sharer. For those addresses that have more than one sharer, we have observed that in the majority of cases extra bits could be taken from other empty ways in the same set. All in all, our proposal minimizes the storage overheads without losing the flexibility to adapt to several sharing degrees and without the complexities of other previously proposed techniques. Detailed simulations of a 128-core multicore architecture running benchmarks from PARSEC-3.0 and SPLASH-3 demonstrate that WC-dir can closely approach the performance of a non-scalable bit vector sparse directory, beating the state-of-the-art Scalable Coherence Directory (SCD) and Pool directory proposals.This work has been supported by the Spanish MCIU and AEI, as well as European Commission FEDER funds, under grant "RTI2018-098156-B-C53".Titos-Gil, R.; Flores, A.; Fernández-Pascual, R.; Ros, A.; Petit Martí, SV.; Sahuquillo Borrás, J.; Acacio, ME. (2019). Way Combination for an Adaptive and Scalable Coherence Directory. IEEE Transactions on Parallel and Distributed Systems. 30(11):2608-2623. https://doi.org/10.1109/TPDS.2019.2917185S26082623301

    Hardware transactional memory with software-defined conflicts

    Get PDF
    In this paper we propose conflict-defined blocks, a programming language construct that allows programmers to change the concept of conflict from one transaction to another, or even throughout the course of the same transaction. Defining conflicts in software makes possible the removal of dependencies which, though not necessary for the correct execution of the transactions, arise as a result of the coarse synchronization style encouraged by TM. Programmers take advantage of their knowledge about the problem and specify through confict-defined blocks what types of dependencies are superfluous in a certain part of the transaction, in order to extract more performance out of coarse-grained transactions without having to write minimally synchronized code. Our experiments with several transactional benchmarks reveal that using software-defined conflicts, the programmer achieves significant reductions in the number of aborted transactions and improve scalability.Peer ReviewedPostprint (author's final draft

    The Impact of Non-coherent Buffers on Lazy Hardware Transactional Memory Systems

    Get PDF
    Abstract When supported in silicon, transactional memory (TM
    corecore